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Abstract — As the mobile application landscape expands, wire- 
less networks are tasked with supporting multiple connection 
profiles, including real-time communications and delay-sensitive 
traffic. Among many ensuing engineering challenges is the 
need to better understand the fundamental limits of forward 
error correction in non-asymptotic regimes. This article seeks 
to characterize the performance of block codes over finite- 
state channels with memory. In particular, classical results from 
information theory are revisited in the context of channels with 
rate transitions, and bounds on the probabilities of decoding 
failure are derived for random codes. This study offers new 
insights about the potential impact of channel correlation over 
time on overall performance. 

I. Introduction 

As preferred mobile devices shift to advanced smartphones 
and tablet personal computers, the demand for low-latency, 
high-throughput wireless service increases rapidly. The shared 
desire for a heightened user experience, which includes real- 
time applications and mobile interactive sessions, acts as a 
motivation for the study of highly efficient communication 
schemes subject to stringent delay constraints. An important 
aspect of delay-sensitive traffic stems from the fact that its 
intrinsic delivery requirements preclude the use of asymptoti- 
cally long codewords. As such, the insights offered by classical 
information theory are of limited value in this context. 

This article focuses on deriving meaningful performance 
limits for delay-aware systems operating over channels with 
memory. The emphasis is put on identifying upper bounds on 
the probabilities of decoding failure for systems employing 
short block-lengths. This is an essential intermediate step in 
characterizing the queueing behavior of contemporary commu- 
nication systems, and it forms the primary goal of our inquiry. 

A distinguishing feature of our approach is the accent on 
channels with memory and state-dependent operation. More 
specifically, we are interested in regimes where the block 
length is of the same order or smaller than the channel 
memory. Mathematically, we wish to study the scenario where 
the mixing time of the underlying finite-state channel is 
similar to the time necessary to transmit a codeword. This 
leads to two important phenomena. First, the state of the 
channel at the onset of a transmission has a significant 
impact on the empirical distribution of the states within a 
codeword transmission cycle. Second, channel dependencies 
extend beyond the boundaries of individual codewords. This 
is in stark contrast with block-fading models; for instance, 
in our proposed framework, decoding failure events can be 
strongly correlated over time. 



Computing probabilities of decoding failures for specific 
communication channels and fixed coding schemes is of fun- 
damental interest. This topic has received significant attention 
in the past, with complete solutions in some cases. This line 
of work dates back to the early days of information theory (TJ. 
An approach that has enjoyed significant success, and chiefly 
popularized by Gallager, consists in deriving exponential error 
bounds on the behavior of asymptotically long codewords Q. 
Such bounds have been examined for memoryless channels 
as well as finite-state channels with memory. In general, they 
can become reasonably tight for long yet finite block-lengths. 
It is worth mentioning that the subject of error bounds has 
also appeared in more recent studies, with the advent of new 
approaches such as dispersion and the uncertainty-focusing 
bound 0, HI, 0, 0. 

In standard asymptotic frameworks, channel parameters are 
kept constant while the length of the codeword increases 
to infinity. While these approaches lead to mathematically 
appealing characterizations, they also have the side effect that 
the resulting bounds on error probability do not depend on 
the initial or final states of the channel. This situation can be 
attributed to the fact that, no matter how slow the mixing time 
of the underlying channel is, the length of the codeword even- 
tually far exceeds this quantity. Therefore, the initial and final 
states of the channel become inconsequential. Unfortunately, 
this situation diminishes the value of the corresponding results 
for queueing models. Often, in practical scenarios, the service 
requirements imposed on a communication link forces the use 
of short codewords, with no obvious time-scale separation 
between the duration of a codeword and the mixing time of 
the underlying channel. 

This reality, together with the increasing popularity of real- 
time applications on wireless networks, demands a novel 
approach where the impact of initial conditions are preserved 
throughout the analysis. A suitable methodology should be 
able to capture both the effects of channel memory as well 
as the impact of the channel state at the onset of a codeword. 
An additional benefit of the slow mixing regime is the ability 
to track dependencies from codeword to codeword, which 
intrinsically lead to correlation in decoding failure events 
and can therefore greatly affect the perceived service quality 
from a queueing perspective. In this article, we establish 
the underpinnings of error probability analysis in the rare- 
transition regime. 

The goal of deriving upper bounds on the probability of 
decoding failure for rare transitions is to characterize overall 
performance for systems that transmit data using block lengths 
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Figure 1 . The Gilbert-Elliott model is the simplest, non-trivial instantiation of 
a finite-state channel with memory. State evolution over time forms a Markov 
chain, and the input-output relationship of this binary channel is governed by 
a state-dependent crossover probability, as illustrated above. 



on the order of the coherence time of their respective channels. 
This article addresses the problem of deriving Gallager-type 
exponential bounds on the probability of decoding failure 
in the rare-transition regime. By construction, these bounds 
necessarily depend on the initial and final states of the channel. 
The analysis is conducted for the scenario where channel state 
information is available at the receiver. Our results are then 
compared to the probability of decoding failure obtained for 
a Gilbert-Elliott channel under a minimum distance decoder 
and the maximum-likelihood decision rule HI, Q, ®- 

After computing the exponential upper bound on error 
probability, we consider the rare-transition regime, in which 
the number of transitions during a block length N, decays 
with N. We apply this condition on the error exponent and 
analyze the results for different TVs. 

II. Modeling and Exponential Bound 

The models considered in this article belong to the general 
class of finite-state channels where the state transitions are 
independent of the input. Such channels have been used 
extensively in the information theory literature, and they need 
very little in way of introduction. On the other hand, it is 
important to establish a proper notation. In this article, the state 
of the channel at time n is denoted by S n and takes value in a 
finite set. The corresponding input and output are represented 
by X n and Y n , respectively. Capital letters are used for random 
variables whereas lower case letters designate elements. In 
general, the input-output relationship is governed by Gallager's 
model, with the conditional probability distribution of the form 

P yUn i $n \%n ; $n — 1 ) 

— Pr (^n — Vni S n — 5 n |^C n — X n , Sn—1 — S n — l) ■ 

In our work, we assume that state transitions are independent 
of the input so that the distribution can be factored into two 
parts, 

P (.Un i S n \x n , S n — 1 

)=P( )P(y ) (i) 

= Pr (S n = S n jSVi— 1 — s n— i)Pr {Yn — Un \X n — x n , S n — s n ) . 

The proverbial example for a channel with memory that 
features this structure is the famed Gilbert-Elliott model, which 
is governed by a two-state ergodic discrete time Markov chain 
(DTMC), and is illustrated in Fig. Q] In this case, the channel 
evolution forms a Markov chain with transition probability 
matrix 
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The input-output relation induced by state s can be written as 

Pr (x n = y n \S n = s) = 1 - e s 

where e s designates the state-dependent crossover probability. 
By convention, we label states so that e\ < £2- Here we call 
the first state as the good state, denoted by subscript g and 
the second one as the bad state denoted by subscript b. 

A mathematical approach that has proven exceptionally 
useful in information theory is the use of random codes. 
Following this tradition, we adopt a random coding scheme 



that employs a code ensemble C with M 



D NB, 



elements, 



where R denotes the rate of the code and N represents the 
common block length. The elements in C are indexed by i G 
{1, . . . , M}. Let Q(x) be an arbitrary distribution on the set of 
possible input symbol. Throughout, we assume that codewords 
are selected independently using the corresponding product 
distribution, Pr (X(i) = x) = Q N {x) = Yin=i Qi x n)- A 
message is transmitted to the destination by selecting one of 
the codewords. We wish to upper bound the probability that 
this codeword is decoded erroneously at the receiver, while 
also preserving partial state information. 

Assumption 1. Communication takes place over a finite- state 
channel that admits the conditional decomposition of (Qj. 
Information is transmitted using the coding scheme described 
above. Furthermore, the state of the channel is perfectly known 
at the receiver. 

For completeness, we reproduce below a celebrated result 
that we use extensively thereafter; a detailed proof can be 
found in the associated reference. 

Theorem 1 (Section 5.6, (2)). Let Pn{u\il) be the transition 
probability assignment for sequences of length N > 1 on a 
discrete channel. Suppose that maximum-likelihood decoding 
is employed. Then, the average probability of decoding error 
over this ensemble of codes is bounded, for any choice of p, 
< p < 1, by 
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This theorem is quite general; it applies to channels with 
memory and, in particular, slow-mixing channels. Pjy(-|-) is a 
generic conditional probability distribution that can represent 
the probability distribution induced by a specific channel 
realization, for instance. We are now ready to present our first 
pertinent result. 

Consider a channel realization that leads to the sequence 
s N = (si, . . . , sn). Moreover, let T — T(s n ) denote the 
empirical distribution of this sequence. 

Proposition 1. Suppose Assumption Q] holds. For any p 6 
[0, 1], the probability of decoding failure at the destination, 
conditioned on S_ N = s N , is bounded by 

P c \s N <exp(-N (E Q . N (p,Q N ,s N ) - pR)) 



where 

Eq,n(p, Qat, s n ) 
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Proof: Applying Theorem Q] to this specific scenario and 
following the same argument as in [2, Section 5.5], we get 
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where Pn\s n {v\2L) — P(y\3l,s N ) is the conditional distri- 
bution of receiving y given x and s N , and the first equality 
follows from (Q3. A key insight is to realize that this func- 
tion only depends on s N through its empirical distribution 
T = T (s N ). The proposition is then obtained by substituting 
this expression into equation (5.6.1) in J2J and noticing that 

e -pNR _ M p " a 

Corollary 1. Again, suppose Assumption^ holds and let u C 
{Sjv|7"(sjy) = T} iMc/; f/iflf Pr(u) > 0. For any p G [0, 1], 
the probability of decoding failure, conditioned on s N € u is 
also bounded by 



P c \ u <exp(-N(E QjN (p,Q N ,s 
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Proof: Using the equivalence of the bound for all channel 
realizations with a same empirical distribution, we get 



P c \u = E Pc \*n Pr (&v = s N \s N e u) 

s N £u 

< J2 e - JV ( Bo ' N( ' , ' Q «^ ) - p7i ) Pr {S N = s N \s N 6 u) 

s N eu 

_ e -N(E , N (p,Q N ,s N )-pR) _ 

An interesting subset u is one where the sequences start from 
a prescribed state sq and end in state sjy, while possessing the 
right empirical distribution. ■ 

Theorem 2. Under Assumption^ the probability of decoding 
failure and ending in state s n conditioned on starting in Sq, 
can be bounded as 



J2 e -N{E , N {p,Q N ,T)-pR) p T 
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where Pr, SN \si denotes the probability of the empirical distri- 
bution of state sequence type ending in s n conditioned on the 
initial state s%, and Eo t pf(p, Q N , T) is given by Proposition^ 



Proof: Following the same approach as in |2] Section 5.9], 
we define 

a(s„_i,s„)= y](^Q(x n )(P(s n \s n ^ 1 )P(y n \x n ,s n ^ 1 )) T ^ ; } 1+p 
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By Proposition [T] we have 
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Y\a(s n -i, s n )=Pr (S_ N = s N \s ) exp[-N(E 0>N (p,Q N , s N ))]. 

n=l 

Then we can write the upper bound on error probability as 

N 
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This upper bound consists of the exponential upper bounds 
conditioned on the channel type, averaged over all state 
sequence types. It means that for each type we have an 
exponential decay with block length in the error probability. 
Note that Pt. Sn \s does not decay exponentially with N. 

As an example, we now compute this upper bound for the 
Gilbert-Elliott channel. Let n q and n b = N—n g be the number 
of times that the channel is in the good and bad states during 
the transmission of a codeword of length N, respectively. 
These numbers are also referred as the occupation times of 
the channel or the channel state type (see chapter 12 in |9l). 
Then i] g = ^ , rfc = % are the fraction of times in each state. 
The fractional type of the channel state sequence is another 
name to describe the fraction of times spent in each state. For 
example, type (0.5,0.5) for a Gilbert-Elliott channel means 
that the channel spent half of the times in each state . By 
Proposition Q] we get 

Eo,n( P ,Q n ,s) = - flnG b (p)+%ln§44^ > ( 4 ) 
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where for the i.i.d input distribution, 

G g (p) = ^(e g ^+(l-e g )^) 



G b(p) = Yp ( £bl+P +(!- e b) 1+ ' 
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Also by small modification to Gallager's derivation for 
E„, N {p,Q N , So ), ((5.9.39) in [2]), we have 
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where e(si) is the unit vector with a one in the i-th position, 
and <z(g, g) = (1 — a)G g (p), a(g,b) = aG q (p), a(b, g) = 
/3G b (p), and a(b,b) = (1 - f3)G b (p). Notice that this bound 
is the same as the bound computed by the sum given in (01 
using and the joint distribution of ry g and sjy, conditioned 
on so- 

On the other hand in the rare-transition regime in which 
Na and N j3 are constant, by taking the limit as N — >• oo, we 
can compute the bound on failure probability (0, with a small 
error (see |10|), as follows 

e A<ln Gb (p) + ,ln§^ + p«) f(x,s N = d\s = c)dx 



mm 

0<P<1 Jo 

= min [eM^H^G,, f iVlngiM)] , (5) 
o< P <i [ \ G b (p)J\ 

where, G c d(.), c,d £ {g,b} is the moment generating func- 
tion of the limiting occupancy time distribution and is given 
by ( fT3l ). By modifying Gallager's approach and considering 
the rare-transition regime, we get an upper bound on error 
probability of maximum-likelihood decoding that retains its 
dependency on the initial and final states. 

III. Derivation of Exact Decoding Error 
Probability 

In this section, we first present the exact expression for 
failure probability over BSC and then derive failure probability 
over Gilbert-Elliott channel to compare with the upper bound. 

A. Random Coding Error ProbabUity for the BSC 

In HI, the error probability for random coding over BSC(p) 
using maximum-likelihood (ML) decoder, for a system which 
treats ties as error is derived as 



Pp 
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In J5), some small modifications has been made to the Fano's 
expression to take ties into account as 
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One can see that this gives a small difference in the error 
probability conditioned on the number of errors. 

B. The Gilbert-Elliott Channel: State Known at the Receiver 

Now, we consider data transmission over the Gilbert-Elliott 
channel using random coding when the state is known at the 
receiver. Two different decoders are considered: a minimum 
distance decoder and a maximum-likelihood decoder. There 
are some differences between these two decoding rules which 
we will go through in detail in the following subsections. 



We then derive the decoding failure probability for the two 
decoders, conditioned on the knowledge of the occupancy 
times. It turns out that when the state is known at the receiver, 
the empirical distribution of the channel state provides enough 
information to determine the error probability. Using the 
distribution of n q and the n g -conditional error probabilities 
for different decoding rules, one can average over all types to 
get the probability of decoding failure, 



Pe= J2 P e\TP<S N £T), 
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where P e \ T is the probability of decoding error given that the 
channel state sequence type is T, and Pr(s Ar £ T) denotes 
the probability distribution of channel type T which will be 
derived in the following section. 

1) Minimum Distance Decoding: Given the channel type, 
the minimum distance decoder on the Gilbert-Elliott channel 
acts similar to a maximum likelihood decoder on BSC. 

Suppose we denote the number of errors in each state by 
e g and e b , where e g = d H (X^,Y q ) and e b = d H {2L b ,Y b ). 
The conditional error probability can be written as 



P e\T - Se g 9 =0 Eeb=0 P e\T,e q ,e b Pe g ,e b \T , 



(7) 



where P e \T,e q ,e h is me error probability conditioned on type 
T and the number of errors in each state, and -P e ,e b |T i s 
the probability oh having e g and e b errors conditioned on T. 
Conditioned on the channel type, the number of errors in the 
good and bad states are independent. So, 



Pr 



,e b |T 



P e q \T P e b \T, 



and we have 
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Because conditioned on e g and e b , the probability of error is 
independent of the channel state sequence type. 

If there exists at least one codeword other than the trans- 
mitted one inside the decoding sphere of radius e g + e b 
centered at the received word, the bounded distance decoder 
either decodes to an incorrect codeword or fails to decode. 
This means ties and errors are grouped. Since there are 
M — 1 codewords other than the transmitted one, the prob- 
ability that none of them falls inside the decoding sphere is 

/ . „ \ M-l 

f 1 - 2~ N E]Lo (7) ) and the P robabilit y that at least 
one of them falls inside the ball is simply 



e g +e b 
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Now, we can simply substitute all the terms in equation [7] 
to have the error probability expression. 

To take into account the ties, and not to treat all the tie 
events as error, we can slightly modify the equation [TT] in the 
same way as ??. 



2) Maximum-Likelihood Decoding: 

Lemma 1. When the state is known at the receiver, the 
maximum-likelihood decoder decodes to the following code- 
word 



argmax ha.(P(Y\X)) = argmin[pye ? ] 
xgc xec 



■ e b \ 



where 7 = |° > 1- Moreover, the error probability 

conditioned on the number of errors in each state and the 
channel state type is 
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(12) 



where C = \"fe g ~\ + e b 

Proof: The proof is given in IVII-AI ■ 

IV. Distribution of Channel State Type for 
Gilbert-Elliott Channel 

The purpose of the current analysis is to study delay- 
sensitive communication systems and evaluate the queueing 
behavior. Since in these kind of systems the block-length can- 
not go extremely large, it turns out that the effect of the initial 
and the final states does not go away with N for moderate 
block lengths. So the error probability depends on the initial 
and the final states (so and sn, respectively). Conditioned on 
the channel type, the error probability is independent of the 
initial and the final states. The only part that depends on these 
states is the distribution of the state occupancies. In QT|, [ 12J, 
the probability distribution of the occupation times for two- 
state Markov chains is derived. However, the given distribution 
is averaged over all final states. Doing some manipulation we 
will derive the joint probability distribution of state occupation 
n q and final state conditioned on the initial state. 

Theorem 3. The joint distribution of channel type and the final 
state conditioned on the initial final state can be computed as 

Pr(n g = m, s N = g\s = g) = (1 - a) m (l - p) N ~ m x 
{F(-N+m, -m; 1; A)-F(-iV+m+l, -m; 1; A)} , 

(1 -a) m - x (l - P) N - m+l a 



Pr{n g = m, s N = b\s = g) = 
Pr(n g = m, s N = g\s = b) 



(1-/3) 

x F(—N + m, -m+ 1;1;A), 
(1 -a) m+1 (l -P) N - m ~ l P 



(1-a) 

x F(-N + m + 1, -m; 1; A), 
Pr{n g = m,s N = b\s = b) = (1 - a) m (l - f3) N -" 1 x 
{F(-N+m, -m; 1; X)-F(-N+m, -m+1; 1; A)}, 

for < m < N. F(., .; .) is the hypergeometric function, 
A = (i-a^i-g) ' an d d = 1 — a — (3. Moreover, Pr(n g = 
0,sat|s o = g) = 0, Pr(n g = T,s N \s = b) = 0, Pr(n g = 
0,sn = b\so = b) = (1 — (3) N , and Pr(n g — N,sn = 

g\so = g) = (1 - a) N . 

Proof: Proof of the theorem is given in IVII-BI ■ 
Notice that we can also compute these probabilities using 
the generating matrix method. Let 
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By taking the 7V-th power of this matrix, coefficient of the ra- 
th power of x in the corresponding entry represents the above 
conditional probability. However, this method does not give us 
the closed form distribution of the occupation times, directly. 

To get the distribution of the fractional occupation time 
rj g , we consider the rare-transition regime in which the tran- 
sition probabilities are scaled with N as = % and 
Pn = §}• in this regime, the expected number of transitions 
in each length- N block is constant. Taking the limit of the 
above conditional probabilities for the DTMC as N — > 00 
gives us the distribution of the fractional occupancy time 
x = liniTv-s-oo ?f- Notice that the transition-rate matrix of the 



CTMC is Q = 



For the DTMC in the rare 
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transitions regime, the probability transition matrix is 
1 
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where I is the 2 by 2 identity matrix. So, 
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This means that the rare-transition limit of the DTMC results 
yields the corresponding results for the CTMC. Similarly, 
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where G 9t N is obtained by replacing a and (3 in G g (.) 
with aw and /3jv, gives the corresponding matrix generating 
function. 

Lemma 2. The joint distribution of fractional occupancy times 
and the final state conditioned on the initial state can be 
computed as 

g) = e 
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Proof: The proof can be found in IVII-CI ■ 

V. Numerical Results 

In this section we present the numerical results for a system 
which transmits data over the Gilbert-Elliott channel with 
e g = 0.01, e b — 0.1. Fig. |2 represents our derived bound 
in <(5j with Gallager-type bound (f3]l in rare-transition regime 
where Na = 0.04, and iV/3 = 0.06. The plots show the 
averaged failure probability over all state transitions. As we 
can see, although the block-lengths are short, the bounds given 
by (0 are very close to (0 while keeping a simpler format. In 
Fig. [3] we compare the exact results for maximum-likelihood 
and minimum distance decoders with our derived upper bound 
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Figure 2. Comparison of our derived bound in {5) with Gallager-type bound 
(3) in rare-transition regime for Nee = 0.04 and N/3 = 0.12 
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Figure 3. Comparison of our derived bound in (5} with the exact values for 
maximum-likelihood and minimum distance decoders for a = 0.0533 and 
/3 = 0.08 



given by © for fixed transitions probabilities. As we expect, 
maximum-likelihood decoder outperforms minimum distance 
decoder. Moreover, by increasing the block-length the bound 
gets closer to the exact value. 

VI. Conclusion 

We proposed a general approach to bound the failure 
probability for random coding over finite-state channels which 
retains the dependency on the initial and final states for 
relatively short block-lengths. We also derived expressions to 
compute the exact error probability for maximum-likelihood 
and minimum distance decoders for Gilbert-Elliott channel and 
we compared them with the proposed upper bound. 
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VII. Appendix 

A. Proof of Lemma Q] 

First we revisit the ML decoding rule for the Gilbert- 
Elliott channel, when the state is known at the receiver and 
conditioned on the channel state type (n g ). So, we have 



P(Y\X) = e^(l-e q y 



s (l-e b ) 



N- 



Upon receiving the word Y_, the ML decoder decodes to 
the codeword X_ that maximizes P(Y_\X_). Equivalently, the 

decoded message will be arg min[7e g + e b ]. It means that 

xec 

for the ML decoder, the errors in the bad state do not cost 
the system as much as the errors in the good state. That is 
because the receiver expects errors more in the bad state. 

To get the error probability for ML decoder, similar to the 
minimum distance decoder, we first condition on e g and e b . 
Then for each set of e g and e b we define C = |~7e g ] + e b . 
Then, the probability of error conditioned on n g , e g , ande b is 
obtained as (fl2l) . 

B. Proof of Theorem [3] 
From [12] we have 



Pr(n g 



l | So = g) = (i-«) m (i-/3) / 



6-1 



1-/3 



I- a 



where a and b are the number of transitions into the initial 
state and out of the initial state, respectively, c is the total 
number of transitions which occur up to time N, and 



ci 




2m 



N 



m < N 
m = N 



By splitting the summation into terms for which c is odd and 
c is even, it follows that if c = 2k, a = b = k and the 
corresponding sum represents Pr(n g = m, sn= g|so = g)- If 
c = 2k + 1, a = k, b = k + 1, and the corresponding sum 
represents Pi(n g — m, sn= b|so= g). So we have 



Pr(n g =m|so = g) 



{l-a) m (l-/3) N - m J2 

k 



rn\f TV — m — 1 



k j\ k-l 
m\f TV — m — 1 



1-/3/ \i-q, 
fe+i 



1-4 



l-OL 



We can set the upper and lower limit on k to and oo, since 
all other terms are automatically zero. From the definition of 

F{— iV+m+1, — m; 1; A) we see that 



a 



m\f N — m— 1\/ a /3 
k J\l-f3 ~ a. 



a 



1-4 



F(-iV+m+l, -to; 1; A) 



So we have 



Now, we want to compute 



Pr(n g = m, Sjv =b| So = g) = (l-a) m (l- /3)"~ m x 



lim Pr(n a = m, sn = blsn = g) 

/V->oo y 



1-4 



F(-iV+m+l, -m;l;A), 



lim 

N-yoo 



for < ?7i < iV. Clearly, for m = and m = JV", this 
conditional probability equals to 0. 
By lfl2ll and noticing that 

Pr(n g = m|s = g) = Pr(n g = m, s N = g\s = g) 
+ Pr(n g = m, s N = b\s = g), 



v n' v n' 



@\N-m ( N 
1-^ 



|F(-iV+m+l,-m;l;A') 



A7 



_ A 

where Xm — -, ^■rf- — g-r. First consider 

lim F -JV+m+l,-m;l; 



iV N 



N- 



(1-f) (l-l) 



we have 

Pr(n g = m, Sjv = g|s = g) = (1 - a) m (1 - f3) N - m x 

(F(-JV+m,-m;l;A)-d(l-/9) _1 
xF(-JV+m+l, -m; 1; A)} 
- (l-a) ro (l-/5) iV - m x 

^^(-JV+m+l.-mjljA) 

= (l-a) m (l-/5) JV - m x 
{_F(-iV"+m, -m; 1; A) 
-F(-JV+m+l, -m; 1; A)}, 

for < to < N. Moreover, Pr(n g = 0, % = g|so = g) = 
and Pr(n g = iV, sat = g|s = g) = (1 — a)^. 

The other conditional probabilities can be derived in the 
same manner. 

C. Proof of Lemma [2] 

In |[T2l . Pedler considers a Markov chain with two states 
and continuous time parameter set [0,i]. Then he defines the 
occupation time X(t) as the time spent in the first state (the 
good state) during the interval [0,t], and derives the PDF 
(probability density function) of X(t), called f(x,t). It has 
been shown that 

f(x, t) = e- ax -^- x \n g 5{t -x)+ ir b 5(x) 

, aBx . i , aBlt — x) . i . T .„ . „ , , , i , 

+ fe r^ 5 + *b(— L )*]h[2(a0x(t - a:))* 

t — x x 

+ (ir g a + ir b B)I [2(a8x(t - x))$], 

where 7r g and 7Tb are the steady state probability of being in the 
good state and the bad state, respectively. Iq(.) and Zi(.) are 
the modified Bessel functions of order and 1, respectively. 

First, we put t = 1 in this formula to normalize the time 
interval and have the distribution of fractional occupancy times 
with respect to N, f(x). Then we rewrite this PDF as 

f(x) = ir g f(x\s = g)+ TT b f{x\s = b) 
= v g [f(x,s N = g\s = g) 
+ f(x, s N = h\s = g)} 
+ n b [f(x, s N = g\s = b) 
+ f(x, s N = b\s = b)]. 



By the definition of F(—N+m+l, —to; 1; A) this limit equals 
to 



i 



= m 



lim 



(N-m-k)- ■ -(iV-m-l)x(m-(fc-l)) • • -to(A') 
m k 



lim 

iV-to 
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N N 



m 1 



to k— 1 
~N N~ 



Pi 



because lim 

i 

N 



N->oo 1 — — J-, nmjv 
limjv^oo 4^ = for i = 1, 2, . . . , k. On the other hand, we 
know that 

(H 



1, lim? 



m i ah 



1 - $ = 1, and 



Zo(*) = £ 



is the zero-th order modified Bessel function. So, 

N N 



lim F — N+m+1, — m: 1: — 



= Jo (2V(l-i)iaj?) 
Moreover, 



lim 

N-t-oc 



/-i _ ^ \ mft _ @ \ N—m 



_ e —ax—P(l—x) 



So, 



lim (iVPr(n g = m, = b|so = g)) 

iV— >oo 



= /(.t, sat = b|s = g) 

However, from the first principals of probability and by |[T2l 
when normalizing the time interval and putting t = 1 in the 
corresponding distributions for the CTMC, we know that 

f(x, s N = b|s = g) + f{x, s N = g\s = g) 
= f(x\s = g) 



= eT ax -^ x - x ){5{l - x) + 
+ aZo[2(a/3a:(l - a;))']}. 



adx 
1-x 



h[2(aBx(l - x)Y- 



So, we can easily see that 



f(x,s N = g\s = g) 



^{SQ. -x)+ ' h[2(apx(l - x))i]} 



D. A second approach to derive the conditional distributions 

In 02), the derivation of of the PMF and PDF of the 
occupancy times for DTMC and CTMC has been done through 
the computation of corresponding bivariate generating function 
and two-dimensional Laplace transform, respectively. 

First consider the DTMC. The bivariate generating function 
of the time spent in the good state, averaged over all initial 
and final states is shown to be 



ip(u,x) = [ n g 7r b ] [I -uG g (x)}~ 



In fact, the matrix [I — uG g (x)] 1 is the bivariate generating 
matrix of the time spent in the good state. For example the 
first entry in the matrix is 



ip gg (u,x) = [ 1 ] [I - uG^x)]- 1 



1 — (1 — a)ux — (1 — f3)u + du 2 x 

1 - (1 - (3)u 
1 — (1 — a)w — (1 — 0)u + duw ' 



Putting w = ux, 

^(u,w) - 

1 - (1 - IX j 

and Pr(n q = m, Sjv = g|so = g) is obtained by expanding 
fy(u,w) as a power series in positive powers of u and w. 
Lemma 1 in Ifl2l helps to get the desired format in terms of 
hypergeometric functions. 

For the CTMC, the matrix of two-dimensional laplace 
transforms of the PDF of time spent in the good state during 
the time interval [0,1] is 



Q 



o 



For example the first entry in the matrix equals to 

1 a/3 



u u(uv — a/3) ' 

4>+9+a, and v — <p+/3 and the inverse of this two- 

g|so 



where u 

dimensional Laplace transform gives f(x,sw = g\so = g). 
Lemma 2 in lfl2l helps to get the desired format in terms of 
modified Bessel functions. 



